数据质量是发展医疗保健中值得信赖的AI的关键因素。大量具有控制混杂因素的策划数据集可以帮助提高下游AI算法的准确性,鲁棒性和隐私性。但是,访问高质量的数据集受数据获取的技术难度的限制,并且严格的道德限制阻碍了医疗保健数据的大规模共享。数据合成算法生成具有与真实临床数据相似的分布的数据,可以作为解决可信度AI的发展过程中缺乏优质数据的潜在解决方案。然而,最新的数据合成算法,尤其是深度学习算法,更多地集中于成像数据,同时忽略了非成像医疗保健数据的综合,包括临床测量,医疗信号和波形以及电子保健记录(EHRS)(EHRS) 。因此,在本文中,我们将回顾合成算法,尤其是对于非成像医学数据,目的是在该领域提供可信赖的AI。本教程风格的审查论文将对包括算法,评估,局限性和未来研究方向在内的各个方面进行全面描述。
translated by 谷歌翻译
在深度学习研究中,自学学习(SSL)引起了极大的关注,引起了计算机视觉和遥感社区的兴趣。尽管计算机视觉取得了很大的成功,但SSL在地球观测领域的大部分潜力仍然锁定。在本文中,我们对在遥感的背景下为计算机视觉的SSL概念和最新发展提供了介绍,并回顾了SSL中的概念和最新发展。此外,我们在流行的遥感数据集上提供了现代SSL算法的初步基准,从而验证了SSL在遥感中的潜力,并提供了有关数据增强的扩展研究。最后,我们确定了SSL未来研究的有希望的方向的地球观察(SSL4EO),以铺平了两个领域的富有成效的相互作用。
translated by 谷歌翻译
从合成孔径雷达(SAR)图像建立高度检索,对于城市应用来说,对于城市应用来说,对于SAR数据的复杂性来说,这一极为重视。本文从单个Terrasar-X Spotlight或Stribmap图像中解决了大型城市地区建立高度检索问题的问题。基于雷达观看几何形状,我们提出该问题可以作为边界框回归问题制定,因此允许将高度数据集成在更大的规模上生成地面真实。我们从地理信息系统(GIS)数据中的建筑占用脚印作为互补信息,并提出了一种限制框回归网络,该网络利用建筑物占地面积与其边界框之间的位置关系,允许快速计算。这对于大型应用来说很重要。在高分辨率聚光灯和RILTMAP模式下,使用Terrasar-X图像在四个城市数据集上验证该方法。实验结果表明,与基于速度的R-CNN的方法相比,所提出的网络可以显着降低计算成本,同时保持各个建筑物的高度精度。此外,我们调查了GIS数据对我们所提出的网络的影响,并且本研究表明边界框回归网络对GIS数据中的定位误差具有稳健。该方法具有适用于区域甚至全球范围的潜力。
translated by 谷歌翻译
复杂知识库问题回答是过去十年的一个流行的研究领域。最近的公共数据集导致这一领域的令人鼓舞的结果,但主要涉及英语,只涉及少数问题类型和关系,在更现实的环境和英语以外的语言中妨碍研究。此外,很少有最先进的KBQA模型在Wikidata上培训,是最受欢迎的真实知识库之一。我们提出了CLC-Quad,这是Wikidata的第一个大规模复杂的中文语义解析数据集,以解决这些挑战。我们与数据集一起介绍了一个文本到SPARQL基线模型,可以有效地应答多种类型的复杂问题,例如事实上的问题,双重意图问题,布尔问题和计数问题,以及Wikidata作为背景知识。我们终于分析了SOTA KBQA模型在此数据集中的表现,并确定了中国KBQA面临的挑战。
translated by 谷歌翻译
用于图像分类的最可公开的数据集是单个标签,而图像在我们的日常生活中是固有的多标记。这种注释差距使得许多预先接受的单标准分类模型在实际情况下失败。该注释问题更加关注空中图像:从传感器收集的空中数据自然地覆盖具有多个标签的相对大的陆地面积,而被广泛可用的注释空中数据集(例如,UCM,AID)是单标记的。作为手动注释的多标签空中图像将是时间/劳动,我们提出了一种新的自我校正综合域适应(SCIDA)方法,用于自动多标签学习。 SCIDA是弱监督,即,自动学习多标签图像分类模型,从使用大量的公共可用的单一标签图像。为实现这一目标,我们提出了一种新颖的标签 - 明智的自我校正(LWC)模块,以更好地探索潜在的标签相关性。该模块还使无监督的域适配(UDA)从单个到多标签数据中可能。对于模型培训,所提出的型号仅使用单一标签信息,但不需要先验知识的多标记数据;它预测了多标签空中图像的标签。在我们的实验中,用单标签的MAI-AID-S和MAI-UCM-S数据集接受培训,所提出的模型直接在收集的多场景空中图像(MAI)数据集上进行测试。
translated by 谷歌翻译
Letting a deep network be aware of the quality of its own predictions is an interesting yet important problem. In the task of instance segmentation, the confidence of instance classification is used as mask quality score in most instance segmentation frameworks. However, the mask quality, quantified as the IoU between the instance mask and its ground truth, is usually not well correlated with classification score. In this paper, we study this problem and propose Mask Scoring R-CNN which contains a network block to learn the quality of the predicted instance masks. The proposed network block takes the instance feature and the corresponding predicted mask together to regress the mask IoU. The mask scoring strategy calibrates the misalignment between mask quality and mask score, and improves instance segmentation performance by prioritizing more accurate mask predictions during COCO AP evaluation. By extensive evaluations on the COCO dataset, Mask Scoring R-CNN brings consistent and noticeable gain with different models, and outperforms the state-of-the-art Mask R-CNN. We hope our simple and effective approach will provide a new direction for improving instance segmentation. The source code of our method is available at https:// github.com/zjhuang22/maskscoring_rcnn. * The work was done when Zhaojin Huang was an intern in Horizon Robotics Inc.
translated by 谷歌翻译
Contextual information is vital in visual understanding problems, such as semantic segmentation and object detection. We propose a Criss-Cross Network (CCNet) for obtaining full-image contextual information in a very effective and efficient way. Concretely, for each pixel, a novel criss-cross attention module harvests the contextual information of all the pixels on its criss-cross path. By taking a further recurrent operation, each pixel can finally capture the full-image dependencies. Besides, a category consistent loss is proposed to enforce the criss-cross attention module to produce more discriminative features. Overall, CCNet is with the following merits: 1) GPU memory friendly. Compared with the non-local block, the proposed recurrent criss-cross attention module requires 11× less GPU memory usage. 2) High computational efficiency. The recurrent criss-cross attention significantly reduces FLOPs by about 85% of the non-local block. 3) The state-of-the-art performance. We conduct extensive experiments on semantic segmentation benchmarks including Cityscapes, ADE20K, human parsing benchmark LIP, instance segmentation benchmark COCO, video segmentation benchmark CamVid. In particular, our CCNet achieves the mIoU scores of 81.9%, 45.76% and 55.47% on the Cityscapes test set, the ADE20K validation set and the LIP validation set respectively, which are the new state-of-the-art results. The source codes are available at https://github.com/speedinghzl/CCNet.
translated by 谷歌翻译
Temporal sentence grounding (TSG) aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query. All existing works first utilize a sparse sampling strategy to extract a fixed number of video frames and then conduct multi-modal interactions with query sentence for reasoning. However, we argue that these methods have overlooked two indispensable issues: 1) Boundary-bias: The annotated target segment generally refers to two specific frames as corresponding start and end timestamps. The video downsampling process may lose these two frames and take the adjacent irrelevant frames as new boundaries. 2) Reasoning-bias: Such incorrect new boundary frames also lead to the reasoning bias during frame-query interaction, reducing the generalization ability of model. To alleviate above limitations, in this paper, we propose a novel Siamese Sampling and Reasoning Network (SSRN) for TSG, which introduces a siamese sampling mechanism to generate additional contextual frames to enrich and refine the new boundaries. Specifically, a reasoning strategy is developed to learn the inter-relationship among these frames and generate soft labels on boundaries for more accurate frame-query reasoning. Such mechanism is also able to supplement the absent consecutive visual semantics to the sampled sparse frames for fine-grained activity understanding. Extensive experiments demonstrate the effectiveness of SSRN on three challenging datasets.
translated by 谷歌翻译
Data heterogeneity across clients in federated learning (FL) settings is a widely acknowledged challenge. In response, personalized federated learning (PFL) emerged as a framework to curate local models for clients' tasks. In PFL, a common strategy is to develop local and global models jointly - the global model (for generalization) informs the local models, and the local models (for personalization) are aggregated to update the global model. A key observation is that if we can improve the generalization ability of local models, then we can improve the generalization of global models, which in turn builds better personalized models. In this work, we consider class imbalance, an overlooked type of data heterogeneity, in the classification setting. We propose FedNH, a novel method that improves the local models' performance for both personalization and generalization by combining the uniformity and semantics of class prototypes. FedNH initially distributes class prototypes uniformly in the latent space and smoothly infuses the class semantics into class prototypes. We show that imposing uniformity helps to combat prototype collapse while infusing class semantics improves local models. Extensive experiments were conducted on popular classification datasets under the cross-device setting. Our results demonstrate the effectiveness and stability of our method over recent works.
translated by 谷歌翻译
Point cloud completion, as the upstream procedure of 3D recognition and segmentation, has become an essential part of many tasks such as navigation and scene understanding. While various point cloud completion models have demonstrated their powerful capabilities, their robustness against adversarial attacks, which have been proven to be fatally malicious towards deep neural networks, remains unknown. In addition, existing attack approaches towards point cloud classifiers cannot be applied to the completion models due to different output forms and attack purposes. In order to evaluate the robustness of the completion models, we propose PointCA, the first adversarial attack against 3D point cloud completion models. PointCA can generate adversarial point clouds that maintain high similarity with the original ones, while being completed as another object with totally different semantic information. Specifically, we minimize the representation discrepancy between the adversarial example and the target point set to jointly explore the adversarial point clouds in the geometry space and the feature space. Furthermore, to launch a stealthier attack, we innovatively employ the neighbourhood density information to tailor the perturbation constraint, leading to geometry-aware and distribution-adaptive modifications for each point. Extensive experiments against different premier point cloud completion networks show that PointCA can cause a performance degradation from 77.9% to 16.7%, with the structure chamfer distance kept below 0.01. We conclude that existing completion models are severely vulnerable to adversarial examples, and state-of-the-art defenses for point cloud classification will be partially invalid when applied to incomplete and uneven point cloud data.
translated by 谷歌翻译